Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lockfile invalidation -- add metadata to generated lockfiles #12427

Merged

Conversation

chrisjrn
Copy link
Contributor

This WIP adds the lockfile invalidation header, but does not yet consume it.

Partially addresses #12415.

@chrisjrn chrisjrn marked this pull request as draft July 26, 2021 21:00
@chrisjrn
Copy link
Contributor Author

@Eric-Arellano Happy to take pointers here, otherwise, fyi!

@chrisjrn
Copy link
Contributor Author

chrisjrn commented Jul 26, 2021

2021-07-26 at 2 03 PM

(It seems to work?)

@chrisjrn
Copy link
Contributor Author

Notably, this implementation should make it easy to add the usage instructions automagically, by amending this line: https://github.com/pantsbuild/pants/pull/12427/files#diff-e09a7e81d00d6394fbb4336f41dde1ba1aea4b0b1af07d173124ecb65b9e9308R155

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay!

Comment on lines 98 to 117
m = hashlib.sha256()
for requirement in self.requirements:
m.update(requirement.encode("utf-8"))
for constraint in self.interpreter_constraints:
m.update(str(constraint).encode("utf-8"))
return m.hexdigest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud here..my first thought was "This seems inefficient to iterate over every element! You could hash the whole collection to avoid that".

But I suspect this is intentional to iterate over every element, and that it makes the algorithm more robust to hash collisions? Is that right? If so, maybe add a comment explaining this nuance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secure hashes from hashlib work on bytes objects -- you can't hash an iterable, but you can hash a bytes representation of every item of an iterable. I don't think there's any nuance here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, @stuhood thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning to this, under the hood, m.update(a + b) is equivalent to m.update(a); m.update(b) (see https://docs.python.org/3/library/hashlib.html) so I presume that hashlib is doing something with stringio or similar under the hood to build up the input to the digest function, which is about as efficient as what we'd end up doing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, I'm wondering if we could do m.update(repr(self.requiremenets).encode("utf-8")), so that would not quite be the same thing. I'm pretty sure that's more likely to have a hash collision, only not sure it matters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the repr of the collection would be equivalent to adding a bit of salt to the hashes; I don't think either approach meaningfully changes the probability of a collision. The chance of a collision over a 256-bit cryptographic hash is vanishingly small either way, so the presence/otherwise of a few delimiters isn't going to meaningfully change that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to consider in this is sort order. Are the requirements/constraints sorted when going into this, otherwise a change of order would yield different hashes, which I suppose is undesirable, as they would still resolve to the same thing once used, iiuc.

src/python/pants/backend/experimental/python/lockfile.py Outdated Show resolved Hide resolved
src/python/pants/backend/experimental/python/lockfile.py Outdated Show resolved Hide resolved
src/python/pants/backend/experimental/python/lockfile.py Outdated Show resolved Hide resolved
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding tests!

src/python/pants/backend/experimental/python/lockfile.py Outdated Show resolved Hide resolved
src/python/pants/backend/experimental/python/lockfile.py Outdated Show resolved Hide resolved
src/python/pants/backend/experimental/python/lockfile.py Outdated Show resolved Hide resolved
Comment on lines 98 to 117
m = hashlib.sha256()
for requirement in self.requirements:
m.update(requirement.encode("utf-8"))
for constraint in self.interpreter_constraints:
m.update(str(constraint).encode("utf-8"))
return m.hexdigest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, @stuhood thoughts?

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks.

The lockfiles should maybe be regenerated? But it's a tedious process..you have to run on a machine w/ Py36-Py39 (not M1) and manually restore the headers for how to regen. I'm personally fine with punting on it for now and one of us implementing #12382 first.

Comment on lines 98 to 117
m = hashlib.sha256()
for requirement in self.requirements:
m.update(requirement.encode("utf-8"))
for constraint in self.interpreter_constraints:
m.update(str(constraint).encode("utf-8"))
return m.hexdigest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, I'm wondering if we could do m.update(repr(self.requiremenets).encode("utf-8")), so that would not quite be the same thing. I'm pretty sure that's more likely to have a hash collision, only not sure it matters.

@chrisjrn
Copy link
Contributor Author

@Eric-Arellano Once the lockfile digests are being consumed, we can potentially add the regen information to the lockfile_metadata_header function and related tests; I don't feel like we need to regenerate the lockfiles right away

@chrisjrn chrisjrn changed the title WIP: Lockfile invalidation Lockfile invalidation Jul 28, 2021
@chrisjrn chrisjrn changed the title Lockfile invalidation Lockfile invalidation -- add metadata to generated lockfiles Jul 28, 2021
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only small nits. Sounds like you're headed out for today, so I'll plan to merge this and then they can be fixed in your upcoming followup PR.

Thanks!

"""Returns an invalidation digest for the given requirements and interpreter constraints."""
m = hashlib.sha256()
pres = {
"requirements": [str(i) for i in requirements],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're already strings

Suggested change
"requirements": [str(i) for i in requirements],
"requirements": list(requirements),

Comment on lines 64 to 68
print(
invalidation_digest(
FrozenOrderedSet(requirements), InterpreterConstraints(interpreter_constraints)
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale

)


def test_hash_depends_on_requirement_source():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_hash_depends_on_requirement_source():
def test_hash_depends_on_requirement_source() -> None:

@chrisjrn
Copy link
Contributor Author

@Eric-Arellano I've fixed those

Christopher Neugebauer added 11 commits July 30, 2021 08:44
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
… be simpler, I promise.

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Christopher Neugebauer added 5 commits July 30, 2021 08:44
[ci skip-rust]

[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@chrisjrn chrisjrn force-pushed the chrisjrn/12415-lockfile-invalidation branch from 2ab468d to 3694ae3 Compare July 30, 2021 15:44
@Eric-Arellano Eric-Arellano merged commit 6ee8f27 into pantsbuild:main Jul 30, 2021
Eric-Arellano pushed a commit that referenced this pull request Aug 12, 2021
Closes #12415

This adds tools for consuming the lockfile invalidation digests that were added in #12427, both for lockfiles specified for the project, and for individual build tools.

Behavior on invalid lockfiles can be configured to either be a warning or an error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants